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FOREWORD 



This report presents the results of a study of the specifications 
for an Information system Intended to support the design, production 
and maintenance of large computer programming systems. Called 
Evolutionary System for Data Processing, or ESDP, It was begun as an 
Internal IBM project In 1965 by the Center for Exploratory Studies 
of the Federal Systems Division and continued under Air Force 
sponsorship during 1967 and early 1968. 

This work has been performed under contract number F1962S-67- 
C0254 for the Electronic Systems Division, U.S. Air Force Systems 
Command. The project monitor was Mr. John Goodenough, ESLFE. 

The authors wish to express their appreciation for the encourage- 
ment and assistance provided by Dr. John Egan, formerly of ESD, and 
their colleagues Dr. Harlan D. Mills and Mr. Michael Dyer. 

This report is in four volumes: Volume 1, System Description; 
Volume 2, Control and Use of the System; Volume 3, The CAINT Executive 
Language and Instruction Generator; and Volume 4, Programming Specifica- 
tions. This report was submitted on January 31, 1968. 

This report has been reviewed and is approved. 

Sit u, £ M *y*~ ^ *£**£-- ^^^ 

SYtfVIA R. MAYER !/ WILLIAM F. HEISLER , Col, USAF 

Project Officer Chief, Command Systems Division 
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ABSTRACT 



SS^P is a proposer! system whose purpose is to acquire, 
store, retrieve, publish and d is sen inate all documentation , 
exclusive of qraphics, concerned with a larqe computer 
pronramminn activity. Docuien tat ion in deemed to consist, not 
onl v of final or formally published after-the-fact reports, but 
of work in a f il es , desian and chanqe notices, i nf ormal d rafts, 
manaqeirent re ports-- in fact, the entire recordable rationale 
underlying a proqramminq svst era. Maximum attention has been 
concentrated on the means of acquirinq and orqanizing 
documentation. Two major, complementary approaches are proposed. 
The first is called Proqram Analysis and is a process of 
£Xt£i<2i_in.:i documentation directly from completed programs. The 
second is called Computer Assisted Interrogation and is a process 
°^ -ilQitinil information directlv ^rom the proqrammer, through 
on- line communication terminals. The p ormer provides canonical 
data about the program's structure. The latter proviles 
explanatory miterial about all aspects of the program, and in the 
absence o*" canonical data, may provide tent a five structural 
in r ornation \s well. The conclusion o r the study group is that 
ESDp i 3 a feasible concept with present-dav technoloqy and that 
it will materially benefit using organisations in the prod net ion 
of proqrams ind in guiding their evolution as requirements 
change. Its value will bo qreater for larger orqanizations, 
whose interna 1 communications difficulties tend to cause truly 
qi 7 an tic inefficiencies. Its implementation as a support system 
for such projects would require a siinificant quantum of 
investment in order to produce these benefits and is predicated 
on th<* use of a computer system delicatel solely to the use of 
ESDP. 
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I 

SYSTEM DESCRIPTION 



1- General AR££oach to Programming, The general architecture 
proposed for the ESDP system is that used for Operating 
System/3 60 -Queued Telecommunication Access Method (OS-QT AM) (see 
Figure 1) • In such a system , terminals communicate with the 
cen tral processing unit via telephone lines and a multiplexor 
channel. In the central processing unit, two or more programs 
are operating asynchronously in separate partitions of high speed 
memory under control of the OS supervisor. 

In one partition, the Message Control Program plus some 
add it ional QTAM code dispatches incoming and outgoing mes sagos. 
The Message Control Program makes use of core buffers (the number 
and size be in a specified by the programmer) plus message queue 
storage on a direct access storage device. 

In the other partitions are the Message Processing 
Programs. These programs perform all the ESDP process in q 
functions. They receive messages from and transmit messages to 
the Message Control Program via GET and PUT macro commands. When 
a message has been received, an ESDP controller, one of the 
Message Processing Programs, must first determine what activity 
the sender is involved in. For instance, it must recognize 
whether a message is a response to a question in interrogation 
or, say, a query. Once this determina tion has been made , some 
type of housekeeping, depending on the particular message and 
activity, is performed to initialize the ESDP functional 
rou tines. Program control is then switched to the particular 
mod ule of programming reg uired to perform the desired activity. 
These modules interact with the system files and issue messa ]es 
back to the terminals via PUT commands. 

In addition to providing the capabilities outlined 
above, the operating system for ESDP must be concerned with the 
following requirements: (1) More than one user terminal may be 
commun icating with any one program module at a time. This 
reguirement may best be met by assuring that the program modules 
are reentrant, (Note that in our current experimental work wo 
have used PL/I which produces reentrant code,) (2) Different 
user terminals may be communicating with different program 
modules at the same time. We feel that this requirement can 
probably be met by a multi-tasking supervisor such as that now 
used in OS/360 with Multiprogramming with a Variable Number of 
Tasks (MVT) , This will provide for a primitive form of time 
sharing by activating tasks whenever an I/O one ration occurs, 
mak ing use of a priority system for the tasks. For the system 
described , this type of time sharing should suffice, since there 
should be no periods of long processing, uninterrupted by I/O 
commands, (3) More than one user terminal may be accessing any 
one data element at the same time. This will require that some 
form of data base lockout be placed into the system. 
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Figure 1. General ESDP System Concept 



The general concept of ESDP is for a teleprocessed 
terminal- oriented system. The terminals themselves should 
comprise the following: 

a. A cathode ray tube display with keyboard entry. 

Most of the conversational processes will be perf ormed 
through this device. Light pen capability and/or vector drawing 
capability may be desired depending on the need for activities 
sue h as production of graphics in the documentation. For the 
strictly conversational documentation activities, generation of 
an average- si zed character set (e.g., 64 character set including 
numbers and upper case letters) on the face of the CRT should 
suf f ice. 

b. Hard copy printer. 

It is often desirable to retain a hard cop y of tha t 
which has been displayed on the CRT. This can be accomplished 
via a typewriter type printer (without keyboard) , the printing 
beinq activated by command from the keyboard associated with the 
CRT. 

c . Line printer 

Lino printing should be centralized so that high volume 
outputs can be generated in the machine room for subseguent 
manual transmittal to the requesting user. 

d. Terminal polling 

A round-robin polling system with priorities such as 
that used by OTAM seems appropriate. Of course, if inefficiency 
results , perhaps the priority scheme should be revised so as to 
be based on the particular activity, for instance, rather than 
simply the terminal identification. 

It is anticipated that during hours when norma 1 ESDP 
documentation activity is light, other programs can be run that 
are not under the general QT AM- ESDP set up. Examples of such 
programs are: 

Zii£ ci^^HHE programs--It may be necessary to move data 
on the direct access storage dev ices in order to reuse sp^ce 
freed via deletion of records. This reorganizing is one tyne o^ 
file processing that might be performed of f - line. In addition , 
there are normal utility functions such as disk copy in q , disk 
printing, etc., that could fit in this category. 

£££z££2££ssors — There may be some pre-processina 

desired for the CAINT Executive Language. This is particula rly 

true when debugging macros' are to be used. Such pre- processors 
could operate off-line. 



2- Ha rdware Assumptions, Hardware has not been considered in 
this study, except indirectly , when feasibility of attaining 
various objectives was considered. There were, however, some 
basic hardware assumptions underlying the study. These are: 

a. Machine Utilization 

A computing system will be dedicated to ESDP, 

b. Machine Type 

A System/3 60 computer with Operating Systera/360 was 
used for the experimental programming in this project. This 
choice of hardware, of course, is not mandatory. However, much 
of the discussion in this report is based on S/360 with OS and, 
the re fore, uses that terminology. 



II 

DATA BASE 



We foresee the need for several files, or, in the 
terminology adopted herein, file sets. While it would be 
possib le to store most of the in formation to be described below 
in one monolithic file, this breakdown recognizes difforing 
frequencies of file modification, different processes to bo 
performed on data, and different means of control of access. 

1- Program De scrigt ion File Set • This file se t con tains a 
logical record for each unit of programming. Its organization 
wil 1 bear a close resemblance to the outline of a con vent ion a 1 
program description, but there is no permanent standard and it is 
expected and encouraged that the content and composition of this 
file will be shaped by the users to fit their own needs. 

The major subjects to be covered, in a generalized form 
of the file a re: 

o Identification — of the program, programmer, date, 
etc. 

o Program Structure-- in terras both of the 
hierarchical structure of the program ani of the 
branching, or control, structure. 

o Data References — the data items named by the 
program and the nature of their use. 

o Logic Description—both symbolic and natural 
language descriptions of what the program does, 
how, why. 

o nana gem en t and Status Data- -information relative 
to the program as an item being produced, its 
schedule, progress , problems, etc. 

o Illustration References-Preferences to flow charts 
and tables to be composed by ESDP and to be 
printed with this prog ram description information. 
Also, references to other graphics, used for 
illustration, which are not able to be stored 
within the ESDP computer. 

Except for the identification section , for which no amplification 
is necessary, these items are discussed below in greater detail. 

a . Program Structure 

This section would contain pointers to related UOP's. 
There are two general categories of relationship: hierarchical 
and control. Hierarchical pointers would indicate subordination 



or superord ination, and control pointers would indicate entry 
points or predecessor and successor OOP's, Entries from, or 
exits to, label variables would be treated as a special case, 
with the variable representing a program switch which might be 
given a form of UOP status* Also contained in this section would 
he a codification of the type of branch control (whether 
unconditional, such as a PL/I GO TO; or conditional, such as an 
IP or DO and the variables that affect the branch. In addition, 
there would be narrative explanations of the control logic , or 
pointers to such explanations. 

b. Data References 

The exact extent to which data documentation should be 
split or duplicated between the program and the data description 
files depends on the philosophy of management of the object 
system. At a minimum, this section of a program description file 
must list the data elements that occur in the program, and must 
give the nature of the usage, such as a control variable (a 
variable t hat directly affects a branching decision) , a computed 
value ( set by an assign or DO statement) or any of a number of 
other categories of usage. The bulk of the actual description of 
the data, as differentiated from the codification of the nature 
of its use, will be carried in the Data Description File Set, 

c. Logic Description 

The essence of logic description is to tell what the 

program does, how, and why. But each of these points is 

suhdividahle and answerable on more than one level of 

abstract ion. Hence, there ma y be a la rge number of individual 

items contained in this file. It must he emphasized that the 
logic description is not restricted to narrative data only. Many 
aspects of a program' s function can be codified, 

d. Management and Status Data 

Perhaps more than any other portion of the 
documentation files, this will he the one most often changed to 
suit indiv idual user needs- The information to be collected and 
the processing to be performed on it is largely a function of the 
individual manager , who must satisfy his own requirements for 
information plus those of his higher management and the 
contractual requirements for system documentation. 

The obvious kind of information to be collected has to 
do with progress in meeting schedules and milestones, manpower 
assigned, problems being met or anticipated , and budget data. 
Mostly, the information will be collected by interrogation, but 
some, such as number and dates of compilation, program lengths , 
etc, , can he acquired automatically. 



e„ Illustration References 

Wo may group illustration material into three classes: 
flow charts or tables that are required according to the system 
documentation plan, flow charts or tables volunteered by the 
programmer or file documentor to augment some asnect of his 
narrative, and other illustrative material that is volunteered, 
but is not in flow chart or table form. The restriction on flow 
chart and table form arises, of course, from the planned use of 
standard graphic programs that will produce these configurations 
easily, but cannot handle the full range of graphic input. Thus, 
if a programmer wishes to input a logarithmic graph, or a d iaqram 
of two aircraft on a collision course, he will probably have to 
draw these in the con vent iona 1 manner, but include in the 
machine-sto re i documentation , a reference to the illustration 
copy. 

Tn the program description file set will be stored only 
pointers to the detailed illustration information, whether or not 
stored within ESDP. We recommend this separation because these 
files will be large, and, while they may be updated whenever the 
program is, they will rarely be subject to information retrieval 
searches in the same way as detailed data in the remainder of the 
file. A user may want to retrieve the information that is 
displayed on a chart, but he will not normally want to retrieve 
the detailed FLOWCHART instructions that organize the display. 

2- 12§.ta Description File Set . The recommended a pproa ch for 
documentation of data files is to create a data descriotion 
record for each file or structure used in any program , with as 
many working records as needed to give each user a chance to 
document the data the way he prefers. Another version of the 
descr ipt ive record will be created and maintained by an 
authorized COM POOL, or data base, controller in whom will be 
vested author ity to make final decisions on data definitions and 
attributes, and who can delete working records at will. His 
expected mode of operation , then , should be to review his data 
items periodically, look at the conflicting descriptions or 
requests of the individual programmers, make his decision on 
which version to accept or declare, and put the final decisions 
into the permanent record. 

Provision can be made, using the dissemination 
services, to promulgate the data base controller's decisions 
immediately to all programmers concerned. 

The organization of the data description record will be 
sim ila r to that for a program. It will have the following Tiajor 
headinqs: 

o Identification 

o Element description — the narrative and 
other information about the item and its 
use. 



o Structure — primarily, this is used for 
higher level structures in order to 
define the subordinate structure. 

o Using program references — pointers, 
possibly some supporting text on what 
effect the particular using program has 
on the data- 

o Illustration references 

3- ££23E2.!!1 ZilS Set. This file set will contain the text of the 
programs being documented. In addition , consideration w ill be 
given to storing, within this file set , program change 
in for ma ti on , separate from the program, itself. This will permit 
a record to be kept of all changes , and this will permit 
programmers to make changes either to the latest version of a 
program, any previous version , or both simultaneously. The 
complete text of any version of the program could be retrieved on 
request . Another possible class of information for this file set 
is partially reduced program analysis data. This would be 
intermediate output, produced during a program analysis run , 
which could be saved to reduce the time required to process a 
change to the program. 

**- Graphic Coding File Set. We recommend the use of a program 
for the automatic production of flow charts and tables. In some 
systems, such as FLOWCHART [1], graphics are assembled by the 
issuance of commands on how to build them, in a manner similar to 
computer programming. The Graphic Coding File Set would contain 
these instructions. 

The graphic files may be updated separately from the 
program or data files they ill us t rate, but this form of updating 
should probably be restricted to changes in layout. Chanqes in 
content or structure should be keyed to changes in the data or 
programs being described, although the initia tive for a change 
may originate with either a program or data description chanqe or 
with a graphic change. 

5- Publication File Set. This file set will contain partially 
processed documentation , taken from any of the other files. The 
preparation of copy for publication can be a time-consuminq 
process. Hence, partially edited material should be retained in 
machine readable form for reprinting or for selection for 
inclusion in differently-organized documents. This form of 
storage is used by the IBM Administrative Terminal System [2], a 
text processinq system, a successor of which is recommended for 
use in ESDP. 



6- Instruction Course File Set. Instructional material producer! 
throuqh ESDP clays a dual role. It is used in its own right as a 
system program, but it is also a form of documentation and 
changes in it must be keyed to changes in the proqrams or data 
being taught. Hence , instruction courses can be stored as 
proqrams, but must have appropriate pointers back and forth to 
the documentation from which they were derived . 

7 - Hissemi nation File Set. These files will contain the profile 
and distribution lists needed to operate the internal ESDP 
dissemination system on documentation and changes thereto. 

8- IHi22L File Set. These f'iles are those indexes and in verted 
indexes used by the information retrieval system to carry out its 
functions. These are also dynamic files, which are subject to 
frequent change as the documentation files change. 



9. Bu£f£E File Set. Buffer files are dynamically created files 
and are for use by the information retrieval system. 



Ill 

PROGRAM ANALYSIS 



A Proqram Analysis (PA) program has been produced 

as part of its own internally sponsored ESDP 

This program accepts input in PL/I, OS/360 Job 

OS/360 Linkage Editor languages and compiles a 

structural data file descriptive of the hierarchical 



1* General, 

by IBM [3] 

activities. 

Control or 

canonical or 

and control structure of the programs and their usaaes of data. 

Source Code Analysis is performed by a set of compiler- 
like analyzers which are oriented to a particular language. The 
number of analyzers is dependent on the make-up of the user's 
programming system. The role of each analyzer is identical 
regardless of the language, namely to map source code into the 
UOP coordinate structure and generate the data records associated 
with it. In this way, each analyzer, which is necessarily 
language dependent, can effect a common interface with the 
system. 

Three analyzers have been written, for OS/360 JOB 
Control Language (JCL) , OS/360 Linkaae Editor Language, and 
OS/360 PL/I Language, This sample was selected to permit 
experimentation with programming systems written primarily in 
PL/ I, of which the analyzer, itself, is an example. We note again 
that the treatment of run-time languages (e,q., JCL) as pro- 
gramming languages is a critical point since it is key to the 
automatic analysis of system- wide interactions. 

Current compilers and assemblers now generate source 
code listings and cross reference lists for data variables for a 
single program at compilation time. However, this is ordinarily 
the extent of their automatic capabilities. Additional 
proqram ming information on proqram interactions within a larger 
system, rationale behind program logic and program groupings, 
data flow through the system, and so on, are necessarily based on 
interrogation-acquired documentation. 

The analyzer parses the source program into elements 
called Units of Programming (UOP), The current proqram produces 
UOP f s at the following levels: 

JOB 

LOAD MODULE 

SOURCE MODULE (Compilation Unit) 

CALL MODULE - Procedure Block 
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GROUP - BEGIN block 
DO group 

IF compound statement 
ON compound statement 

SEGMENT 

For each UOP in a structure, a data record is created 
which contains the appropriate structure, logic and data usage 
information. This structure and logic data of the individual UOP 
records is also the mechanism for creating the total structure of 
a program system or any of its major components. 

To make the programmer aware of how his program is 
structured, a revised program listing is generated, which 
gra phically depicts the coordinate structure and labeling for 
this program. This revised listing is useful, not only as a 
guide to the files, but also as a picture of program execution 
which may easily become obscure in the compiler generated 
listing , pa rticula rly with free format languages where multiple 
statements can be strung together in a single print line. 

System-wide interactions of a program can be obtained 
through the automatic analysis of the Object Module generated by 
the compiler and the JCL deck that would be written for 
execution. 

Object module analysis yields information reqardinq 
program interaction (external procedure in PL/I terms) in a 
system. This process consists, basically , of extracting symbols 
from the external symbol dictionar ies of all the object modules 
involved in a given linkage editor run, detecting module cross- 
references an 1 , discriminating between data references and branch 
references. 



Thij information is critical when sots of programs ire 
linked together and manipulate the same data, since this is 
source of most problems and delays luring integration 
programming systems where the various pieces were written 
dehuaaed bv different dpodI e. 



ths 

of 
and 



W ithin OS/360, the execution of a program would require 
a JOB Control deck or program. The analysis would equate, within 
the UOP records, the file declarations at the JOB level with all 
references to these files down to the SEGMENT level. In a more 
com pi ex case, where condition codes and multiple job steps were 
defined, this same correlation of program units and data usage 
would have added significance. 

2- QE§£S.ti2H 2£ EE22£ara £H<LlX§iS- The analysis of PL/I code is 
performed in several phases. In Phase 1, PL/I source code is 
read in, and blanks, comments, and constants are eliminated. The 
remaining characters are translated through use of a translation 
table. The general effect of this translation is to replace the 
source language string with numeric codes in such a way that 
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alphanumeric strings are grouped in the higher number codes f 
special characters in the lower number codes, and operation codes 
in the middle. This is done so that in future processing, simple 
limit testing can be used on the codes to determine the type. 

An output string is organized in which each statement 
is given a statement number. 

The statements are scanned and whenever labels, file 
names, parameters, or condi tion names are encountered , a 
dictionary entry is created, and a pointer to the dictionary 
entry is stored with the statement. 



When UOP defining statements 
encountered in the scan, entries are made 



(e.g., DO, BEGIN) are 
in a parsing table. 



Then, when statements are encountered that end UOP*s {e,g, , END) , 
the table is searched to determine which entry is closed. The 
table then contains the statement numbers defining the limits of 
the UOP«s, 



At the completion of Phase 1, the parsing table 
been filled, and the dictionary has been partially filled. 



has 



Phase 2 reformats the source text , indicating the 
parsed units and statement numbers. The units are indicated in 
such a way as to ease reading. 
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At this point in the Program Analysis process, two 
interna 1 tables have been built-- the diet i onary and the parsing 
table. Their formats are described below. 

a. Dictionary 

(1) Bytes 1-2 a re reserved for a hash chain 
during the first three phases of analysis and for a data type 
code during the latter phases. 



(2) Bytes 3-6 contain the scope data 
entry in terms of procedure and statement numbers. 



for the 
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(3) Bytes 7-8 contain for structures, pointers to 
structure elements, and for labels, the statement number of the 
lab el dec la rat ion- 

(4) Bytes 9-28 contain the identifier as it 
appears in the source code, 

(5) Bytes 29-32 contain a bit table that defines 
the unique attributes or characteristics of the entry, 

(6) Bytes 33-40 are a set of offset values that 
point to the overflow area. Note that certain PL /I attributes 
carry value information, e, g. , precision, bounds of dimensions, 
file environments, etc. This value data is stored in the 
overflow area and the offsets are used to delimit the start and 
stop of various values, 

(7) Bytes 40-119 contain any values associated 
with attributes. 

Figure 2 illustrates a typical dictionary entry. 

b. Parsing Table 

To delimit UOP structures and keep track of UOP 
nesting, this table is generated during the analysis- Tt a lso is 
delcared as an array of bit strings where each element represents 
a s inqle UOP. The forma t of each element is as follows: 

(1) Bit 1 - status switch used to de term ine if 
UOP has been closed, 

(2) Bits 2-9 - UOP level code where ootfe runs 
from 1-6, corresponding to JOB level to Segment level. 

(3) Bits 10-73 contain the procedure name or 
label on the including procedure, 

(4) Bits 74-85 contain the statement number of 
the first statement in the UOP, 

(5) Bits 86-97 contain the statement number of 
the last statement in the UOP. 

(6) Bits 9 8-106 contain a dictionary pointer to 
the label associated with the UOP, if any. 

Figure 3 illustrates a typical parsing table entry. 
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Hash Scope Structure Data Name Attri- Table Over- 
Chain Pointers butes of flow 

Offsets Area 

Byte~ 3 7 9 29 3 3 4 

Fiqure 2, A Typical Dictionary Entry, 



Including First Last 
Switch Level Procedure Statement Statement Dictionary 
Number Name Number Number Pointer 



Byte 1 2 10 74 86 98 

Figure 3, A Typical Parsing Table Entry, 
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The string generated in Phase 1 is read in Phase '* , an*] 
a new string is produced that is completely coded. \11 
identifiers are replaced with dictionary pointers. 

Phase 5 determines data type for all data in the 
dictionary and adds a data type code to the dictionary. 

Phase 6 reads in the parsing table and reads in the 
program statements, one at a time. From these it generates the 
U0P records and with the additional input of the dictionary it 
generates the trailer records. These are written out on tape. 

The JCL cards are used by the JCLSCAN Program, and each 
card is examined to determine if it is a JOB card, an EXEC card, 
a DD card, or other. Cards in the other category are immediately 
rejected. JOB cards are further examined for condition codes at 
the JOB level. If they exist, they are stored on an analysis 
list. 

For EXEC cards, the program stores the job step name in 
the analysis list and then determines if the name refers to a 
catalogued procedure. If it does, the name is marked as job 
level. If it does not, the name is marked as load module level. 
The EXEC card is then checked for JOB step parameters, and if 
there are any, they are stored in the ana lysis list. The sime 
process is followed for JOB STEP condition codes. 

For DD cards, the DSNA ME is stored in the anal ys is list 
along with any disposition parameters. 

A fter all of the JCL cards have been read , the analysis 
list is further processed , the process varying with the type of 
JCL statement. 

JOB - The OOP name is extracted from the job statement 
label field. The entry and exit portions of the U0P are marked, 
and if condition codes exist, subordinate U0P are marked as exit 
points. 



DD - A data reference entry is made in the U0P for the 



DD, 



EXEC - JOB STEPS become subordinate units to the JOB 
UOP. The OOP names are the JOB STEP names. If J03 STEP 
parameters exist, a data reference entry is made using a d in\m y 
name. If JOB STEP condition codes exist, the subordinate 
transfer table is marked accordingly. 

The Linkage Editor Analysis Program (LEAR) begins by 
reading from the primary input stream. A test is made to 
determine if the f irst entry in the stream is a linkage editor 
command. If it is not) the entry is processed as an object 
module. If it is, another test is made to determine if the 
command is an INCLUDE statement. INCLUDE statements effect 
readings from secondary input streams. All other command types 
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are ignored* Object module process ina continues in the primary 
stream until an INCLUDE is found* Then, processing shifts to the 
secondary stream* In the secondary stream, the first column of 
the card image is checked for a blank (indicating command) , a 12- 
9-2 punch (indicating an object module) , or any other (indicating 
a load module)* The first two are processed as previously 
discussed and the third (load module) causes a load module 
subordinate unit entry to be established* 

Once all of the linkage editor object modules, load 
modules and commands have been processed, a OOP record is formed. 
This UOP is in the same format as a PL/I UOP* 

3. Addit ional Requirements, There must be added to the program 
analysis implementation an incremental analysis capability. When 
a programmer makes a change to an existing program, he should not 
have to run the entire program through analysis again. This 
process now takes an amount of time on the same order as a full 
compilation, hence in a large system it could become a 
significant drain on computer capacity if repeated often. 
Instead, the approach recommended is to ha ve ESDP store the 
latest copy, and let the programmer make changes by use of ADD, 
CHANGE, and DELETE commands, treating his stored program as a 
file. In this way , PA need only analyze the changes and make 
rain i ma 1 modification to the canonical data f ile , and new 
interrogations can be initiated only on those portions of the 
program affected. 

More detailed information than that produced by the 
present PA program is needed for classifying the manner in which 
data labels are used or referred to within the program text. For 
a variety of reasons (e*g,, making up more detailed 
int errogations, assisting in test planning, assisting in 
debugging, and providing better cross indexing of documentation) , 
we feel that data usage should be classified in as much detail as 
possible. Furthermore, the information desired is available 
through the program analysis function, but currently is discarded 
rat her than saved (this is also true of compilation) , A 
hierarchical classification code should be used for each 
appearance of a data label. This code should reflect whether the 
item is changed by this usage or not; whether it is changed by 
being assigned a new value or having a new value read in: 
whether it is used without being changed; whether when used in an 
assignment statement, it is used as an item in itself, or an 
index to another item, a control item, etc, A first 
approximation to such a classification system is given in Figure 

4, which also appeared in Volume 1. 
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3. 



Context of Appearance 

1.1 Assignment Statement 
1.1. i Computed Value 
1. 1. 2 Argument 

1.2 Control Statement 

1.2.1 Variable I/O Command 

1. 2. 2 Branching or Transfer Command 

1.2. 2.1 Argument or condition statement (TF f 
ON ...) 

1.2.2.2 Iterative Control Variable (DO) 

1. 2. 2. 2. 1 Initial index value 

1.2. 2. 2. 2 Increment 

1.2.2.2.3 Maximum value or limit 

1.2. 2. 3 Variable address 

1. 3 Subroutine/Function/Macro Calling Sequence 

1.3.1 Transmitted to SP/Fu net ion/Macro 

1.3.2 Received from SR/F unction/Macro 

1.4 Data Declaration Statement (or other non-execute ble 
statement) 

1. 5 Input/Output 

1.5.1 Input 

1.5.1.1 Input Control Variable 

1.5.1.2 Data Element real in 

1.5.2 Output 

1. 5.2.1 Output Control Variable 

1.5.2.2 Data Element written out or transmitted 

Change Status 

2.1 Value Changed by Containing Statement 

2.1.1 Value Directly Assigned by Assignment Statement 

2.1.2 Value Directly Changed by DO Statement 

2. 1. 3 Value Directly Changed by Variable I/O Statement 

2. 2 Value not Changed by Containinq Statement 
Structural Role 

3.1 Data Element is a Structure or Array 

3.2 Index or Subscript 

3.2.1 VALUE OF AN Index 

3.2.2 Element of an Index Term 

3. 3 Scalar Item 



Figure 4. Classification of Data Usage by a Program. 
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Another aspect of program analysis (or possibly of 
information retrieval) to be borne in mind is that, as the 
documentation files grow large, there will be inevitable errors, 
such as programmers misnaming programs, submitting the wrong 
version of a program for analysis, entering changes incorrectly 
(resulting in an actual program that differs from what the author 
thinks it is), etc. These are normal mistakes of any programming 
project and, in a purely manual system they can be tolerated and 
relatively easily reversed. The documentation file system and 
the program analysis system must be so designed as to anticipate 
such errors and, while it is not ESDP* s responsibility to detect 
them, it should be possible within ESDP to correct them with 
rain imum difficulty. 
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CONVERSATIONAL PROCESSING 



A thorough description of the con versa ti on a 1 Language 
and processing required is contained in Volume 3 of this report. 
Programming of the conversat iona 1 routines represents a 
considerable share of the entire ESDP implementation effort. The 
scope, techniques and other aspects of the programming require -i 
should become evident from the d iscussion of the con versa t ion a 1 
act ivit ies and language conta ined in that volume and therefore 
will not be further discussed here. 
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V 
INFORMATION RETRIEVAL 



1- I§DP Zil§§- The files handled in the ESDP system include the 
following: 

a. Program Description File Set 

This file set contains a record for each UOP in the 
object system. The information in the ^ile may be derived 
through program analysis , interrogation , or both with the source 
being identified, 

b. Data Description File Set 

This file contains a record for each Unit of Data 
(UOD). Here, the information is obtained through interrogation 
only. UOP and UOD are linked via pointers since the data are 
referenced in UOP. 

c. Index File Set 

A file is build of keywords indicating for each the UOP 
or UOD and IEN associated with the key word. The key words are 
extracted automatically at the time that a response to a question 
is entered luring interrogation. At that time r the UOP name of 
the UOP currently the subject of the interrogation and the IEN 
associa ted with the guest ion are appended to all key words in the 
response. 

d. Buffer File Set 

Provision is made in the ESDP concept for general file 
handling capabilities. Programs that interpret file format 
tables for file accessing will be included. In addition, file 
building may be done on line as well as off line. The intended 
use of the special files is as personalized subsets of the ESDP 
data base. It is anticipated that this feature would be heavily 
used by system managers to create, update, and search 
persona lized management information systems. 

2- lile Building and Maintenance. Creation and modification of 
UOP records and UOD records are planned in ESDP to match the 
changes in the object system of programs. Records are created 
whenever the system becomes aware of a new UOP or UOD. This 
information may be acquired in any one of a number of ways: 

a. Source Code Parsing x 

Program Analysis 'creates UOP's by parsing source 
language code. UOP* s created are named either by program label 
or by a combination of containing UOP name and statement numbers. 
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b. Source Code References 

References may be made in the source code to UOD or 
other UOP not subject to Program Analysis, The appearance of 
these references in source codinq will cause the creation of the 
appropriate records and names will be taken from the source cole, 

c. Interrogation 

OOP or UOD may also be named by the programmer at a 
console in the interrogation process. This can occur during 
design interrogation or during interrogations performed after th^ 
object system program has been subjected to program analysis. 
Naming of these UOP's and UOD's is a simple process since the 
programmer assigns these. 



Whenever changes to source code are submitted for 
analysis or incremental interrogations are processed, changes to 
data items in existing UOP or UOD records are likely to tak^ 
place. The changes can take the form of ADD, DELETE, or REPLACE 
(DELETE and ADD) . The way in which the system handles the file 
updating will depend on the data elements to be changed 
manner in which the requested change 
system. 
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concept for ESDP file updating as a re suit 

rograms is to rerun Program Analysis on the 

changed UOP. Old UOP records will not be 

Through the reconci liation process, 

with the old UOP records will be linker] to 
When the reconciliation has been completed, 
1 be deleted. Keyword references must be 
onciliation process. If text from an TEN 
f a record is to be moved to another TEN of 
eyword updating amounts to changing the I EN 

appropriate keyword records. If new text 
s are extracted in the normal fashion. For 
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all deleted records, the keyword deletion is performed as in the 
case of deleted text through incremental interrogation. 

3- K§Y.12Ell file. An experimental keyword extraction program is 
now being tested. This program operates as follows: 

(1) Responses to questions are subjected to 
keyword extraction under the control of the CEL Program. 

(2) Responses are edited to e liminate deleted 
lines, to eliminate deleted characters (backspace and retype) r to 
eliminate carriage returns, and to convert all letters to upper 
case. This is done to eliminate mismatches in the keyword list. 
For instance, "Computer" without such editing would not match 
with "computer" and similarly carriage return characters, 
backspace characters, or delete characters will eliminate any 
possibility of an exact match. 

(3) Each word in the response is compared with 
words in a common word list. Common words are not stored as 
keywords. 



(4) 



Each keyword (i. e. , not common word) is 



stored and is tagged with the IEN associated with the question to 
which this is a response. If the keyword is already recorded , 
the IEN is added to a list of IEN's in which the word appeared. 
In addition to the IEN , the keyword could be tagged with its 
position w ithin the response. This would enable subsequent 
retrieval based on position of keywords in a response. 

4. Searching. Information in ESDP is indexed in four ways: 

a- Program Element 

One index to a piece of data is the particular element 
(UOP, UOD r etc.) with which it is associated. This information 
is obtained through interrogation for design documentation and 
through program analysis for final documentation. 

b. Keywords 

Another index is the keyword index. The keywords ire 
extracted automatically from responses to interrogation 
questions. 

c. Data Names and Labels 

These are character strings used in the program or the 
pro gram design being documented. They, too, serve as indexes to 
the UOP or HOD records. 

d. Hierarchic Code 

ESDP employs a hierarchical coding system and attaches 
a code number to each element of data. This number is called an 



22 



Information Element Number (IEN) . The structure of these 
numerical codes is intended to classify any data collected by 
ESDP about an object system of programs. 

Searching of the ESDP files is requested from terminals 
or ESDP programs. The query language is basically the same 
subset of PL/I as is used for executive programming. Again, the 
IF statement is the core of the subset language. The user may 
express complex Boolean relationships as the condition a 1 
sta tements. For instance. 

IF (A> B) S (C = •!•) (A + Cn- '7 1 ) 

is an example of a conditional statement. Here r A, B r and C arc 
data base items, uniquely identified. The system must be capable 
of so identifying data base elements and will do so through use 
of symbol tables. 

An additional feature of ESDP information retrieval is 
the ability to specify a file as the disposition for retrieved 
data. Thus, the user can dynamically create files from 
inf or ma tion in other files. He may perform cyclic searches by 
retrieving data, placing it in a buffer file, and then using the 
retrieved elements as search criteria for another search. 

Cyclic retrieval is defined as the use of information 
retrieved f roii one query as part of the statement of a subsequent 
query to the same or a different file, so that a cycle of query, 
retrieval, query based on retrieval data, retrieval, etc., can be 
set up. 

Dynamic file creation also allows for creation of small 
"custom ized '• files for use only by a particular individual or 
organization. These files would be orqanized for the particular 
application at hand. 

Outputs of results from sea rches a re h ighly flexible in 
ESDP. A user may specify (1) mode of output (printer, console, 
etc. ) , (2) data manipulations (sorts, totals, other calculations, 
etc.), and (3) formatting (page widths, spacing, titles, etc.) 

The information retrieval function should be designed 
to operate both as a subsystem of other programs and to service 
directly queries submitted through on-line terminals or through a 
background, or deferred, "job queue. 

Every effort should be made to Integra te the query 
Ian guage with the Inter rogation/ In struct ion executive language, 
using a common syntax and a common interpreter or compiler. 
Probably, a compact form of query statement will be used by the 
programs, with the human- written form t ran si a ted into this 
compact form. Programs ' calling information retrieval would 
directly fill in this table, somewhat as a calling sequence. Tho 
human user then has the options of writing out his query directly 
in this compact language if he wishes and is adept enough , 
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writing out the query in a more natural form (not natural 
language, but a programming- like language) or building his query 
gradually through a computer assisted, conversational process. 

In regard to performance, while specifications have not 
yet been developed, it seems that the following are required: 

o Records must be retrievable on the 
obvious characteristics which are 
usually unique identifiers: address, 
sequence number within a file, value of 
a key or sort field, 

o Records must also be retrievable on the 
basis of Boolean combinations of these 
or other record attributes, each 
attribute (probably) being able to be 
stated as one or more re la t ion ship 
statements, as SALARY = 10000 or AGE 
<4 0. 

o Individual items, fields, arrays, sub- 
records, etc., can be specified as the 
information to be retrieved from a 
record-- the ent ire record need not be 
retrieved in response to a query. Thus, 
the burden of extracting the exact 
information needed from a record is 
placed upon the retrieval system, not 
the calling program, 

o Information called for may be ordered to 
be held in a buffer or temporary storage 
area for later reference. In particular, 
this requirement is imposed to make 
cyclic retrieval possible, 

o The requestor, whether a person or a 
program, may specify the recipient of 
the information, which need not be the 
requestor. In other words, an IR system 
user may call for the retrieval of 
information and its presentation to some 
other person, output device, or program. 
Thus, the retrieval system can be used 
as an internal message handling system. 

In ESDP the user will specify explicitly what are 
usually implicit in a query — the precise data to be retrieved 
(not the entire record containing what is wanted) , the place to 
store it (if other than the printed page or a CRT console) and 
the addressee, who is usually the reguestor. 
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It should be noted that the requirements for a system 
responsive to both human requestors and programs, with one of the 
using programs being a query acquisition program that 
communicates with the human user. 
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o Subject of the question-- name of UOP or 
data element, particularly aspect being 
guestioned, 

o Structural relationship, for cross- 
referencing purposes, with other pro- 
grams or files. 

These items, combined with keywords extracted from the response , 
give the potential of a very rich keyword index for use in 
query ing or in automatic dissemination. The same items can be 
used to form an index in each published report. These indexes 
would, of course, be automatically modified if the ba sic 
documentation were modified , e it her through interrogation or 
program analysis. 

We anticipate that some number of standard queries will be 
previously written and invoked by the user as he needs then. 
Some of these queries may be complete as stored and some may need 
com pie t ion or assignment of values to parameters throuqh 
interrogation. 
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This type of standard query should be quite easy to 
implement. The majority of queries, however, will be 
unanticipated. These will be processed through an interpreter 
program designed especially to operate on queries expressed in 
the CEL, The interpretive approach is dictated since compilation 
of queries cannot be performed rapidly enough to permit an 
eff icient on-line system. 

Many information retrieval queries will be of a form in 
which a single data file is used and a single IF statement is 
suf ficient to decide upon record selection. Often, the key of 
the record will be given so the desired record may be immediately 
retrieved, Tf the key is not given, the implication is that each 
record must be examined for its compliance with the query, a 
process considerably shortened by the use of inverted indexes, if 
they exist. Checking for the existence of these indexes, and 
making use of them, is a function of the interpreter. 

In a typical query, the program will have been written 
in skeleton form, and the remaining data is acquired at the time 
of invocation . The items acquired are: 

o Record selection criteria — a single TF 
statement, although containing any num- 
ber of clauses, 

o The "THEN" functions — what to do with a 
selected record, e.g. r RETRIEVE items A, 
B, C, retrieve A to B(1,I) ("retrieve 
item A and place it in record I of 
Buffer File 1. ") 

o The "ELSE" functions — iteration logic 
will be built into the original, but the 
user can add functions. He may, for 
example, choose to retrieve on the basis 
of a false IF condition, 

o Processing of retrieved output that has 

been stored in a buffer, eg, , SOFT (Bl) 

on B (1, 1) , i, e, , sort file Bl on the 
fir st field of a record, 

o Add it ional commands, such as DEFER, 
SAVE, etc. 
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A third class of documentation is the formal 
documentation normally produced at the end of a project, or for 
major progress or milestone reports. These are printed much less 
often than the others, but require many printinq features not 
always available on computer- gene rated documents. 



at this point, that the logical 
by exist inq proqrams, such as FLOWCHART 
[1] and Administrative Terminal System [2], will handle most of 
these documentation problems. 



It appears, 
capabilities represented 



ATS offers a 11 needed features except ability to han1l<* 
qra phics. It offers a much wider choice of type fonts when 
printinq at a terminal with chanqeable type elements, and the 
ability to underline text. Variation in type fonts for 
proqramminq documentation is useful to help distinguish, for 
example, between labels or data names and normal English usage, 
as SPEED is a data item, but s^eed is a rate of motion. 

AUTOCHART [4] enables the entry of flow charts and 
tables. It. is desiqned to accept manually prepared input, hence 
should be able to interface smoothly with the interroqa tion 
processor. The desiqner does his own flow chart layout. The 
com pen sa tion for the extra work of doing this is a compact chart 
organized in the most meaningful way, to the author. Tables and 
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charts can be modified without complete regeneration, using an 
updating in terrogation, as in CAINT, 
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VII 
FILE PROCESSING 



1, E2iHiL^S§Ht§» From an operational viewpoint, ESDP imposes 
the following s to rage/retrieval requirements: 

a. Data stored on or transferred to bulk storage must 
be directly accessible to satisfy a broad range of user q uery 
requirements and data storage requirements from on-line con solos, 

b. Large data base processing capabilities must be 
provided in order not to restrict the size of user programming 
systems, 

c. Evolutionary file growth must be accommodated since 
at the outset of the programm ing development cycle the ESDP f iles 
are empty and evolve as the user's programming system develops. 

d . Highly variable record lengths must be allowed 
since these are dictated by the varying characteristics of the 
proqrams comprising the user's programming system, 

e. The processing cannot rely on predetermined 
knowledge of the distribution of search keys, used in accessing 
data, since these are dictated by the symbol coding conventions 
adopted for the user's programming system and by his natural 
lanquaqe responses to interrogation, 

f • Certain files are directly related to others. For 
exa mple r keywords are related to the UOP in which thoy were used- 
Therefore, access to one may necessitate access to the other. 

The ESDP file processor addresses these requirements and attemnts 
to provide a solution that effectively handles each requirement 
within the total context. While this may not be the optimum 
solution for any given requirement, when considered by itself, it 
does cope with the totality of requirements in an effective 
fashion. 

A total file management or information processing 
system was not considered to be an appropriate development based 
on ESDP requirements. The preferred approach was to develop a 
set of generalized modules to perform discrete functions which 
would be usable throughout the ESDP system. 

Experimental versions of the file processing rou tines 
have been written in the PL/I language. The physical file 
processing uses the Basic Direct Access Method (BDA?1) [ ^ ] through 
PL/I. All data sets are physically organized by regions, where a 
reg ion is defined as a' unit of storage, equivalent to a disk 
track. This equivalence is based on the current PL/ I 
implementation and may vary as other storage devices are 
supported in subsequent implementations. 
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2- Rationale £2H ESDP iEH£2§£ll* T ^e following discussion is 
limited to accessing technigues for files where data or records 
must be directly accessed. It excludes technigues which rely on 
sequential data organization and on a total file scan. While the 
latter have application in certain classes of retrieval problems, 
this is not the case in the ESDP system, since we are dealing 
with a large data base and a n on- batched guery/retrieval 
env ironraent . 

The accessing problem is one of uniguely locating each 
unit of data within a file. Two general technigues can be used 
to perform this location function; namely, table look-up and 
randomization techniques. 

Let us consider randomization techniques first. These 
are based on some arithmetic manipulation of the character codes 
of the name or key for the data to be located. The manipu lation 
results in an address for the peripheral storage device at which 
the desired data is stored. Numerous techniques are available 
for randomizing or manipulating the character codes. Each is 
effective in a given application because it is tailored to the 
pec uliar characteristics of the names or keys used in the 
application. However, no techniques exist which guarantee a 
unique transformation in every case. To handle the problem of 
non -unique transformations, so-called f chain f processing 
techniques must be employed (e.g., hash tables). The 
effectiveness of a technique in any given application is 
dependent on how well it restricts the size of chains and on the 
overflow procedures adopted for chain processing. 

For the ESDP system, randomization techniques were 
rejected as the bases of the file accessinq mode. First , as 
not ed earlier, the names or keys used in ESDP file accessing are 
dictated by the symbol coding conventions adopted for a user's 
programming system and his natural language responses to 
interrogation. They cannot be predetermined. No known 
randomization technigue exists which can produce satisfactory 
results, given any key set. 

Second, randomizing techniques are useful only for a 
single access path to file data (i.e., access through a single 
key set). Because of the nature of the data in the ESDP files, 
multiple access paths must be available to the same data. Thus, 
table look-up techniques would be required to handle the 
secondary key sets and access paths. 

Randomizing technigues are more effective in loosely 
packed file situations. Efficiency drops sharply as denser 
packing is used. The resultant increase in storage requirements 
cannot be offset by comparable table look-up storage 
requirements. Thus, this technique would und uly tax storage 
requirements. File maintenance also becomes a problem if 
extensive chains develop. 
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To avoid this maintenance and reorqanization problem , 
the ESDP file processor uses a different technique for index 
building and searchinq, which is a take-off on existing list 
process ing ideas. 

Tn the ESDP file system, the index is treated as a 
group of entries which are physically strung together into a 
list, not necessarily contiguously, and which are log ica 11 y 
ordered or sequenced by the use of pointers or address indicators 
which are appended to each entry- Because of this uncoupling of 
the physica 1 and logical ordering of the index (or any list), we 
can eliminate the index reorganization problem, and with some 
other simple techniques, the index maintenance problem. 

A binary tree structure was selected to pern it 
efficient search strategies, based on binary search techniques. 
The form of the index entry (or structure node) adopted for the 
ESDP case is shown in Fiqure S. Here: 

a. The Index Key Field contains the key or name used 
to access file data. This field contains such elements as the 
names of the (1) Units of Proqramming (IJOP) ; (2) data variable 
names; or (3) descriptor terras (i.e. , keywords) . 

b m The Low Sequence Pointer contains the address of 
another index entry whose key is lower in sort sequence than the 
key of the record be inq examined. Similarly, the Hiqh Sequence 
Pointer contains the address of a record whose key is higher in 
sort sequence than the key' of the record beinq examined. 



desired 



c. The Data Field con tains any additional data that is 
to be stored in the index. For ESDP, this field could be 
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Index Key Field 



Low Sequence Pointer 



High Sequence Pointer 



Data Field 



Figure 5* Index Entry 
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used for citation lists, 
addressinq controls* 



disk addresses and allocation and 



This general form was defined to permit the 
implementation of a single program to perform index building and 
searching of a variety of indices, each of which had a different 
soecific organization. As typical in list processing, an initial 
pointer or •anchor* is maintained that points to the first index 
ent ry or head of the list* 
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IE2 ESDP Index IniElementation. Typically, list 

techniques have been applied to lists which can bo 

in core memory. For the ESDP problem, the file sizes 

re qui rem en ts are too large to justify core resident 

us, some different techniques had to be employed which 

te with a disk resident index. First, indices were 

and these segments were the units for storinq and 

from disk. The selected segment size was set at the 

of the disk unit used. The following PL/I structure 

defines the segment format used : 



SEGMENT FORMAT 



1 STRUCT, 

2 ID BIT (8) 

2 ANC bit (8) 

2 MINSTRUCT (N) , 

3 KEE CHAR (M) , 
3 LOW BIT (16) , 
1 HIGH BIT (16) , 
3 ENT CHAR (0) , 
3 ENT CHAR 



/♦Index Identifier */ 
/♦Next a vailable entry pointer */ 

/*Index Field */ 
/♦Low Sequence Pointer */ 
/♦High Sequence Pointer */ 
/♦Data Field ♦/ 



where 



N, fl, take on different values depending on the part icula r 
index characteristics, such as size of search Key , extent of 
data f ield , and maximum number of entries per segment . 

ID is a one byte code identifying all segments in the same 
index, 

ANC is a pointer that indicates which entry in the segment 
can be used for the next index term to be stored. If the 
value of ANC is zero then the segment has the maximum num- 
ber of entries or is full. 

Each index segment must be initialized before operation as 
follows: 

a. Each index field (KEE) is set to binary zero, 

b. Each hiqh sequence poin ter is set to binary zero. 

c. Each low sequence pointer is set with the subscript 
value of the next entry in the array (MINSTRUCT), LOW 
(N) is set to binary zero. 
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d. ANC is given the value ({/!) (i-e. , initially the 

first segment entry is to be used for the first index 
entry in the segment) . 

A generalized index search routine has been written 
which has multiple entry points depending on the number of 
discrete indices. (At present # eight indices are maintained; six 
corresponding to the six UOP levels # one for data definitions and 
one for keywords.) Upon access, this routine reads the first 
segment of the specified index, which contains the anchor or 
start of the index list. It begins comparing the passed search 
key against the KEE's (stored keys) in the segment. If this is 
the first entry in the index, then KEE (1) in segment one equals 
binary zero and no match is found. The no match occurs whenever 
the contents of KEE differ from the passed search key. If the 
passed key and the entry in KEE match, then the routine returns 
to the calling program and passes back the subscript value of the 
matching entry and disk address of the index segment, currently 
in core memory. 

When a no match case arises, a check is made to 
determine if the passed key is less than or greater than the 
index entry (KEE) being compared. Then, either the low sequence 
pointer (LOW) or the high sequence pointer (HIGH) is used to 
determine the next entry against which a comparison is to be 
made. Before picking up this next entry the following checks 
have to be made on the pointer value: 

a. If it is equal to binary zero, then we are at a 
terminal node, i.e. , an entry for the passed key does not exist 
in the index. In this case a zero subscript value is returned 
that indicates no matching entry (terminal node) to which a link 
must now be made. The back pointer will be either a negative or 
a positive value, depending on whether the link should be made to 
the terminal node's low or high sequence pointer, currently in 
core memory. As usual, the disk address of the index segment is 
also returned. 

b. If it is less than a threshold value, then the 
value is a subscript to an entry within the index segment , 
currently in core memory. The threshold is current set at 25S 
which is the maximum number of index entries permitted in a given 
segment. The search program uses the subscript to pick up the 
next entry and repeat the comparison operation. 

c. If it is greater than the threshold, then the value 
has a double meaning; namely, it contains the disk address of the 
index segment in which the next entry can be found and the 
subscript value of that entry within the segment. The search 
program uses the disk to overwrite the current core resident 
segment with the new segment. The subscript value is then used 
to pick up the desired entry and repeat the comparison loop. 
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If the ANC value for the index segment, currently in 
core memory, is non-zero, then the new entry can be inserted in 
the current segment and the ANC value is the subscript to this 
available space. Before using the indicated entry space , the 
calling program must replace the current ANC value by the value 
of the LOW pointer in the indicated entry space. Thus, for 
subsequent users, ANC will have an appropriate subscript value 
and continue to point to the next available entry- The new entry 
is initialized as required by the call in q program and both LOW 
and HIGH pointers are set to zero, making the new entry a 
terminal node. The returned back pointer value is used to make 
the necessary linkage with the last compared key to preserve the 
logical ordering of the index. 
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VIII 
DEBUGGING SUPPOBT 



1, Introduction, We have described in Volume 3 of this report a 
language designed to facilitate programming of conversational 
processes. This language, called the CAINT Executive Language 
(CEL) is employed in writing executive programs to control the 
logic of guestion selection and wording, response analysis, and 
various other activities- CEL-written programs require 
debugging, and ESDP is designed to include a support package 
specifically tailored to assist the executive programmer in 
debugging. 




For example, 
IF (Condition ) THEN DISPLAY (Variable) ELSE HALT, 
3, Commands. 

a, HALT 

Halt the CEL program and continue with the next 
sequential instruction when the start button on the user's 
console is pressed, 

b, DISPLAY 

Display the contents of named variables (in core or 
external storage) . 

For example , 

4 = 1; B = 6; C = A + B; 

DISPLAY (C) ; 
results in a printout of: 

C - 7 

c, ALTER 

Enables the programmer to modify some area of core or 
external storage. 

For example, 
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Programmer types: 

C = A - B (continuing the above example) 

C is set egual to -5. 

d. CHANGE 

Enables the programmer to change statements in the CEL 
program being debugged. 

There are three forms of CHANGE defined: 



(1) 
(2) 
(3) 



CHANGE (DELETE (statement number) TO 
(statement number) ) 

CHANGE (REPLACE (statement number) TO 
(statement number) WITH source code) 



CHANGE (INSERT AFTER (statement 
source code ) 



number) 



**- H§£ °f Debugging Ca£a bill ties. Data value changes may be 
traced throughout the execution of a program. 

For example, 

DISPLAY (X) ; 



means display the current value of 
changes value. 



every time 



IF (Y > 5) B (Y <10) THEN DISPLAY (X); 

means display the current value of X every time X 
changes only if Y is greater than 5 but less than 10, In other 
words, rather than inserting this statement in the CEL program 
every time that X is changed, the programmer can state the 
condition and desired action once, and the command will be in 
effect throughout the program execution. 

Another option is deferral of printout. 

For example, 

DISPLAYD (X) ; 

means record all value changes of X and print off-line, 

A particular CAINT applicat ion might be a deferred display of nil 
the guest ion (in segue nee) output for a given run. 

During a debug run some means of data base protection 
would have to be provided. This might take the form of putting 
the data base in a read-only mode for each debug application. 
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This would mean that any area of the data base could be read, but 
writing would be directed to a scratch file, and any attempts to 
rea d "changed" data base records would also be directed to the 
scratch file (onto which they had previously heen written) , 

A typical CHANGE and ALTER situation might occur 
following detection of a bug, 

A CHANGE command (for replacement, deletion, or 
insertion of statements) could be used to attempt error 
correction* An ALTER command could then be used to reinitialize 
data variables to reasonable values for the point at which the 
pro q ram is restarted (using the BEGIN command), 

f>, Methods of Implementation, There are two qeneral methods to 
support the CEL with the debugging capabilities described above: 
compilation and interpretation. 

Some pre-processing would be reguired to prepare a deck 
for compilation so as to support the debugging capabilities 
described above. This could be coupled with an interrogation 
designed to elicit from the programmer the specific debugging 
reguirements for each program. 

Consider a simple example of the kind of preprocessing 
under discussion. 

The programmer writes the following code in which the 
numbers on the left are machine generated statement numbers, to 
be used by the programmer as operands of a CHANGE command, 

S00100 LI: DO; 

S00200 IF UOP.NUMENS = fg THEN CALL OUT ('NO MEMBERS 1 , 

S00400 UOPAD); ELSE CALL OUT (MBMLIST, NUMENS) ; 

S00500 IF UOPAD = UOPEND THEN GO TO L2; 

S00700 UOPAD = UOPAD + 1; 

S00800 CALL NEXT (UOPAD) ; 

S00900 GO TO LI; 

S01000 L2: END 

The following is an example of a dialogue that could 
then take place: 

HSG Type the number of each command to he used 

1. HALT 

2. DISPLAY 

3. ALTER 
RES 2 

11SG Which variables are to be displayed? 
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RES UOP.NUMENS 

MSG Which variables are to control the display of UOP.NUHENS? 

RES UOP.NUMENS 

MSG For which new values of UOP.NUMENS is UOP.NUMENS to be 
displayed? 

RES UOP. NUMENS > fg 

MSG Pre-processing has begun 

MSG Compilation has begun 

MSG At which statement do you want execution to begin? (Type 
Number 1 - 10) . 

RES 1 

MSG Execution has begun 

Etc. 

The deck produced by the pre- processor looks a s 
follows: 

L1L2: PROC OPTIONS (MAIN) ; 

^INCLUDE DATA (DCLl) ; Includes declarations necessary to define 

data base references in program . 

^INCLUDE TEMP (CODE) ; Includes code to initialize variables in 

support of DISPLAY. (This code was 
generated as a result of the dialogue 
pictured above. ) 

^INCLUDE PROG (CHECK) ; Includes this system (ESDP) proqram as 

internal procedure. This program 
supports the debugging capabilities 
described above. 

LABI (1) : LI: DO; LABI (1) defines this statement as an 

element in the label array LABI (n) 
(where N = 10 in this case). Each 
statement in the original will bo 
prefixed in this way to allow 
i mplementation of the BEGIN command . 

LAB1(2):IF UOP.NUMENS = fS 

THEN 

LABI (3): CALL OUT (10 MEMBERS, 



39 



LAB1(U): ELSE CALL OOT (MEWLIST, 

NUMENS) ; 

CALL CHECK: 

LAB1(S): IF UOPAD = UOPEND 

THEN 

LAB1(6) : GO TO L2; 

LAB1(7): UOPAD = UOPAD ♦ 1; 

CALL CHECK: 

LAB (8): CALL NEXT (UOPAD); 

CALL CHECK; 

LAB1(9) : GO TO LI; 

LAB1(10) : L2: END; 

Tn summary: 

The HALT command could have been enabled by the same 
interrgation process which enabled the DISPLAY command* The 
program can still be halted by the programmer by pressing the 
stop button on the console. 

The CHANGE command can be utilized following a halt by 
means of a dialogue with the CHECK routine. 

The DISPLAY command has been selectively enabled 
through interrogation. 

The ALTER command could have been enabled by means of 
interrogation* 

The BEGIN command is supported by embedding the program 
in a label array as shown. 

The main differences between compilation and 
interpretation are (1) interpretation will effect some 
implementation costs whereas the compiler is essentially free, 

(2) interpretive program CHANGES can be made more rapidly, 
because it is not necessary to recompile and linkage edit, and 

(3) interpretive debugging commands can be entered at execution 
time* 
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gation and is a process of eliciting information directly from the programmer, through on-line 
communication terminals. I he tormer provides canonical data about the program's structure. The 
latter provides explanatory material about all aspects of the program, and in the absence of 
canonical data, may provide tentative structural information as welL The conclusion of the study 
group is that ESDP is a feasible concept with present-day technology and that it will materially 
benefit using organizations in the production of programs and in guiding their evolution as 
requirements change. Its value will be greater for larger organizations, whose internal communica- 
tions difficulties tend to cause truly gigantic inefficiencies. Its implementation as a support 
system for such projects would require a significant quantum of investment in order to produce 
these benefits and is predicated on the use of a computer system dedicated solely to the use of 
ESDP. 
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